The Informative Vector Selection in Active Learning using Divisive Analysis

نویسندگان

  • Zareen Sharf
  • Maryam Razzak
چکیده

Traditional supervised machine learning techniques require training on large volumes of data to acquire efficiency and accuracy. As opposed to traditional systems Active Learning systems minimizes the size of training data significantly because the selection of the data is done based on a strong mathematical model. This helps in achieving the same accuracy levels of the results as baseline techniques but with a considerably small training dataset. In this paper, the active learning approach has been implemented with a modification into the traditional system of active learning with version space algorithm. The version space concept is replaced with the divisive analysis (DIANA) algorithm and the core idea is to pre-cluster the instances before distributing them into training and testing data. The results obtained by our system have justified our reasoning that pre-clustering instead of the traditional version space algorithm can bring a good impact on the accuracy of the overall system’s classification. Two types of data have been tested, the binary class and multi-class. The proposed system worked well on the multi-class but in case of binary, the version space algorithm results were more accurate. Keywords—Active learning; machine learning; pre-clustering; semi-supervised learning

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stock Price Prediction using Machine Learning and Swarm Intelligence

Background and Objectives: Stock price prediction has become one of the interesting and also challenging topics for researchers in the past few years. Due to the non-linear nature of the time-series data of the stock prices, mathematical modeling approaches usually fail to yield acceptable results. Therefore, machine learning methods can be a promising solution to this problem. Methods: In this...

متن کامل

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods.  In filter methods, features subsets are selected due to some measu...

متن کامل

A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification

High dimensionality of text can be a deterrent in applying complex learners such as Support Vector Machines to the task of text classification. Feature clustering is a powerful alternative to feature selection for reducing the dimensionality of text data. In this paper we propose a new informationtheoretic divisive algorithm for feature/word clustering and apply it to text classification. Exist...

متن کامل

Identification of Alzheimer disease-relevant genes using a novel hybrid method

Identifying genes underlying complex diseases/traits that generally involve multiple etiological mechanisms and contributing genes is difficult. Although microarray technology has enabled researchers to investigate gene expression changes, but identifying pathobiologically relevant genes remains a challenge. To address this challenge, we apply a new method for selecting the disease-relevant gen...

متن کامل

Query Learning with Large Margin Classi ersColin

The active selection of instances can sig-niicantly improve the generalisation performance of a learning machine. Large margin classiiers such as support vector machines classify data using the most informative instances (the support vectors). This makes them natural candidates for instance selection strategies. In this paper we propose an algorithm for the training of support vector machines u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017